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Abstract 

This paper deals with the consistency of the least squares estimator of a convex regression func- 
tion when the predictor is multidimensional. We characterize and discuss the computation of such an 
estimator via the solution of certain quadratic and linear programs. Mild sufficient conditions for the 
consistency of this estimator and its subdifferentials in fixed and stochastic design regression settings 
are provided. We also consider a regression function which is known to be convex and component- 
wise nonincreasing and discuss the characterization, computation and consistency of its least squares 
estimator. 



1 Introduction 

Consider a closed, convex set X c IR"^, ford > 1, with nonempty interior and a regression 
model of the form 

Y = (l)iX) + e (1) 

where X is a 3£- valued random vector, e is a random variable with E (e |X) = 0, and (p : 

J?'^ ^ K is an unknown coni/ex function. Given independent observations (Xi , Yi) , . . . , ¥„) 

from such a model, we wish to estimate (p by the method of least squares, i.e., by finding 
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a convex function 0„ which minimizes the discrete ^£2 norm 
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among all convex functions i/a defined on the convex hull of Xi , . . . , X„. In this paper we 
characterize the least squares estimator, provide means for its computation, study its 
finite sample properties and prove its consistency. 

The problem just described is a nonparametric regression problem with known shape 
restriction (convexity). Such problems have a long history in the statistical literature 
with seminal papers like Brunk (1955), Grenander (1956) and Hildreth (1954) written 
more than 50 years ago, albeit in simpler settings. The former two papers deal with the 
estimation of monotone functions while the latter discusses least squares estimation of 
a concave function whose domain is a subset of the real line. Since then, many results 
on different nonparametric shape restricted regression problems have been published. 
For instance, Brunk (1970) and, more recently, Zhang (2002) have enriched the litera- 
ture concerning isotonic regression. In the particular case of convex regression, Hanson 
and Pledger (1976) proved the consistency of the least squares estimator introduced 
in Hildreth (1954). Some years later, Mammen (1991) and Groeneboom et al. (2001) 
derived, respectively, the rate of convergence and asymptotic distribution of this esti- 
mator. Some alternative methods of estimation that combine shape restrictions with 
smoothness assumptions have also been proposed for the one-dimensional case; see, 
for example, Birke and Dette (2006) where a kernel-based estimator is defined and its 
asymptotic distribution derived. 

Although the asymptotic theory of the one-dimensional convex regression problem 
is well understood, not much has been done in the multidimensional scenario. The ab- 
sence of a natural order structure in IR*^, for d > 1, poses a natural impediment in such 
extensions. A convex function on the real line can be characterized as an absolutely 
continuous function with increasing first derivative (see, for instance, FoUand (1999), 
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Exercise 42. b, page 109). This characterization plays a key role in the computation and 
asymptotic theory of the least squares estimator in the one-dimensional case. By con- 
trast, analogous results for convex functions of several variables involve more compli- 
cated characterizations using either second-order conditions (as in Dudley (1977), The- 
orem 3.1, page 163) or cyclical monotonicity (as in Rockafellar (1970), Theorems 24.8 
and 24.9, pages 238-239). Interesting differences between convex functions on IR and 
convex functions on are given in Johansen (1974) and Bronstein (1978). 

Recently there has been considerable interest in shape restricted function estima- 
tion in multidimension. In the density estimation context, Cule et al. (2010) deal with 
the computation of the nonparametric maximum likelihood estimator of a multidi- 
mensional log-concave density, while Cule and Samworth (2010), Schuhmacher et al. 
(2009) and Schuhmacher and Diimbgen (2010) discuss its consistency and related is- 
sues. Seregin and Wellner (2009) study the computation and consistency of the maxi- 
mum likelihood estimator of convex- transformed densities. This paper focuses on esti- 
mating a regression function which is known to be convex. To the best of our knowledge 
this is the first attempt to systematically study the characterization, computation, and 
consistency of the least squares estimator of a convex regression function with multidi- 
mensional covariates in a completely nonparametric setting. 

In the field of econometrics some work has been done on this multidimensional 
problem in less general contexts and with more stringent assumptions. Estimation 
of concave and/or componentwise nondecreasing functions has been treated, for in- 
stance, in Banker and Maindiratta (1992), Matzkin (1991), Matzkin (1993), Beresteanu 
(2007) and Allon et al. (2007). The first two papers define maximum likelihood esti- 
mators in semiparametric settings. The estimators in Matzkin (1991) and Banker and 
Maindiratta (1992) are shown to be consistent in Matzkin (1991) and Maindiratta and 
Sarath (1997), respectively. A maximum likelihood estimator and a sieved least squares 
estimator have been defined and techniques for their computation have been provided 
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in Allon et al. (2007) and Beresteanu (2007), respectively. 

The method of least squares has been applied to multidimensional concave regres- 
sion in Kuosmanen (2008). We take this work as our starting point. In agreement with 
the techniques used there, we define a least squares estimator which can be computed 
by solving a quadratic program. We argue that this estimator can be evaluated at a sin- 
gle point by finding the solution to a linear program. We then show that, under some 
mild regularity conditions, our estimator can be used to consistently estimate both, the 
convex function and its subdifferentials. 

Our work goes beyond those mentioned above in the following ways: Our method 
does not require any tuning parameter(s) , which is a major drawback for most nonpara- 
metric regression methods, such as kernel-based procedures. The choice of the tuning 
parameter(s) is especially problematic in higher dimensions, e.g., kernel based meth- 
ods would require the choice of a d x d matrix of bandwidths. The sets of assumptions 
that most authors have used to study the estimation of a multidimensional convex re- 
gression function are more restrictive and of a different nature than the ones in this pa- 
per. As opposed to the maximum likelihood approach used in Banker and Maindiratta 
(1992), Matzkin (1991), Allon et al. (2007) and Maindiratta and Sarath (1997), we prove 
the consistency of the estimator keeping the distribution of the errors unspecified; e.g., 
in the i.i.d. case we only assume that the errors have zero expectation and finite sec- 
ond moment. The estimators in Beresteanu (2007) are sieved least squares estimators 
and assume that the observed values of the predictors lie on equidistant grids of rect- 
angular domains. By contrast, our estimators are unsieved and our assumptions on the 
spatial arrangement of the predictor values are much more relaxed. In fact, we prove 
the consistency of the least squares estimator under both fixed and stochastic design 
settings; we also allow for heteroscedastic errors. In addition, we show that the least 
squares estimator can also be used to approximate the gradients and subdifferentials 
of the underlying convex function. 
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It is hard to overstate the importance of convex functions in applied mathematics. 
For instance, optimization problems with convex objective functions over convex sets 
appear in many applications. Thus, the question of accurately estimating a convex re- 
gression function is indeed interesting from a theoretical perspective. However, it turns 
out that convex regression is important for numerous reasons besides statistical curios- 
ity. Convexity also appears in many applied sciences. One such field of application is 
microeconomic theory. Production functions are often supposed to be concave and 
componentwise nondecreasing. In this context, concavity reflects decreasing marginal 
returns. Concavity also plays a role in the theory of rational choice since it is a common 
assumption for utility functions, on which it represents decreasing marginal utility. The 
interested reader can see Hildreth (1954), Varian (1982a) or Varian (1982b) for more in- 
formation regarding the importance of concavity/ convexity in economic theory. 

The paper is organized as follows. In Section 2 we discuss the estimation proce- 
dure, characterize the estimator and show how it can be computed by solving a positive 
semidefinite quadratic program and a linear program. Section 3 starts with a descrip- 
tion of the deterministic and stochastic design regression schemes. The statement and 
proof of our main results are also included in Section 3. In Section 4 we provide the 
proofs of the technical lemmas used to prove the main theorem. Section A, the Ap- 
pendix, contains some results from convex analysis and linear algebra that are used in 
the paper and may be of independent interest. 

2 Characterization and finite sample properties 

We start with some notation. For convenience, we will regard elements of the Euclidian 
space IR'" as column vectors and denote their components with upper indices, i.e, any 
z e IR"^ will be denoted as z = (z\ z^, . . . , z™). The symbol IR will stand for the extended 
real line. Additionally, for any set A c IR'^ we will denoted as ConviA) its convex hull 
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and we'll write Conv{Xi,...,Xn) instead of Coni'({Xi,...,X„}). Finally, we will use (•,•> 
and I • I to denote the standard inner product and norm in Euclidian spaces, respectively. 

For 3C = {Xi, . . . , X„} c X c [R^, consider the set Jf^ of all vectors z=iz\..., z")' e IR" 
for which there is a convex function ifr -.X such that y/{Xj) = z-i for all 7 = 1, ... , n. 
Then, a necessary and sufficient condition for a convex function y/ to minimize the sum 
of squared errors is that y/iXj) = Z^for: j = l,...,n, where 



zeJ^x [k=l J 

The computation of the vector Z„ is crucial for the estimation procedure. We will 
show that such a vector exists and is unique. However, it should be noted that there 



these functions can play the role of the least squares estimator, there is one such func- 
tion which is easily evaluated in Con f {Xi X„) . For computational convenience, we 
will define our least squares estimator 0„ to be precisely this function and describe it 
explicitly in (7) and the subsequent discussion. 

In what follows we show that both, the vector Z„ and the least squares estimator 
cf)„ are well-defined for any n data points iXi,Yi),...,iX„,Yn). We will also provide 
two characterizations of the set J^a- and show that the vector Z„ can be computed by 
solving a positive semidefinite quadratic program. Finally, we will prove that for any 
X G Conv (Xi, . . . , Xfi) one can obtain 0„ (x) by solving a linear program. 

2.1 Existence and uniqueness 

We start with two characterizations of the set JTar. The developments here are similar 
to those in Allon et al. (2007) and Kuosmanen (2008). 

Lemma 2.1 (Primal Characterization) Let z = iz^,...,z") e IR". Then, z e JT^: if and 




(2) 



are many convex functions y/ satisfying y/{Xj) - Zj^ for all j -\,..., n. Although any of 
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only if for every j = l,...,n, the following holds: 



n n n 

lk „k . V" nk _ 1 V" nk ~ 



z^=inf^ ^e'^z^ =1' Y.9''Xk = Xj,e>0,eeU"\, (3) 
[k=i fc=i fc=i J 

where the inequality > holds componentwise. 
Proof: Define the function g : IR'^ ^ [R by 

g(x) = inf J S"^" : E = 1' E ^^^fc = X, > 0, e I (4) 

where we use the convention that inf(0) = +oo. By Lemma A.l in the Appendix, g is 
convex and finite on the Xj's. Hence, if satisfies (3) then = g{Xj) for every j - 
l,...,n and it follows that z e JTar. 

Conversely, assume that z e J6x and giXf) 7^ z^ for some 7. Note that g(Xjt) < z^ for 
any k from the definition of g. Thus, we may suppose that g{Xf) < zK As z e J^g-, there 
is a convex function if/ such that y/iXjc) = z^ for all A; = 1, . . . , Then, from the definition 
of g{Xj) there exist 00 g D^" with^o > and0^+...+0^ = 1 such that = Xj 

and 

E 0oV(^fc) = E ^0 < 2' = ^(^7) = ^ E ^o^fc . 



k=\ 



Vfc=i 



which leads to a contradiction because 1//^ is convex. □ 

We now provide an alternative characterization of the set JTa; based on the dual 
problem to the linear program used in Lemma 2.1. 

Lemma 2.2 (Dual Characterization) Let z g IR". Then, z g Jsf^c if and only if for any 
j = l,...,n we have 

z-'' =sup|<f,Xj> + ry:<f,Xfc> + r;<z*^ V fc = 1,.. ., n, f e K'^, r; e r} . (5) 

Moreover, z g if and only if there exist vectors , . . . , ^„ g IR"^ such that 

{^j, Xk-Xj)<z^-z^ y k,ie{\,...,n}. (6) 
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Proof: According to the primal characterization, z g Jigc if and only if the linear pro- 
grams defined by (3) have the z^ 's as optimal values. The linear programs in (5) are 
the dual problems to those in (3). Then, the duality theorem for linear programs (see 
Luenberger (1984), page 89) implies that the z^'s have to be the corresponding optimal 
values to the programs in (5). 

To prove the second assertion let us first assume that z g Xx- For each 7 g {1, . . . , n} 
take any solution {^j,rjj) to (5). Then by (5), rjj = z^ - {^j,Xj) and the inequalities in 
(6) follow immediately because we must have {^j,Xic) + rjj < z^ for any k e {\,...,n}. 
Conversely, take z g and assume that there are ^1, . . . , ^„ g IR"^ satisfying (6) . Take any 
7 G {1, . . . , n}, rjj - z^ - {^j,Xj) and 9 to be the vector in IR" with components O'^ = 6kj, 
where 6kj is the Kronecker 5. It follows that {^j.Xt) + rj j < z'^ y k = I, n so i^j,r]j) is 
feasible for the linear program in (5). In addition, 6 is feasible for the linear program in 
(3) so the weak duality principle of linear programming (see Luenberger (1984) , Lemma 
1, page 89) implies that Xj) + rj< z^ for any pair {£,, rj) which is feasible for the prob- 
lem in the right-hand side of (5). We thus have that z^ is an upper bound attained by 
the feasible pair {£,j,rjj) and hence (5) holds for all 7 = 1, . . . , □ 

Both, the primal and dual characterizations are useful for our purposes. The primal 
plays a key role in proving the existence and uniqueness of the least squares estimator. 
The dual is crucial for its computation. 

Lemma 2.3 The setJ^sc is a closed, convex cone in IR" and the vector Zn satisfying (2) is 
uniquely defined. 

Proof: That is a convex cone follows trivially from the definition of the set. Now, 
if z € JTar, then there is 7 g {1, . . . , n} for which z^ > giXj) with the function g defined 
as in (4). Thus, there is Oq g U" with 60 >0 and 9^ + ... + 0^ = I such that O^Xi + ...+ 
e^Xn = Xj and I^^^ O^z'' < zK Setting 5=\ [zJ - 1^^^ O^z''] it is easily seen that for 
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all C £ rifc^i (2*^ -5,z'' + 6) we still have I^^^ < and thus (€Jfx. Therefore we 
have shown that for any z ^ JT^r there is a neighborhood ?7 of z with ?7 c IR" \ J^f^;. 
Therefore, is closed and the vector Z„ is uniquely determined as the projection of 
(Yi, . . . , Y„) G [R" onto the closed convex set JTa: (see Conway (1985), Theorem 2.5, page 
9). □ 



We are now in a position to define the least squares estimator. Given observations 
iXi,Yi),..., (X„, Yn) from model (1), we take the nonparametric least squares estimator 
to be the function 0„ : IR'^ ^ K defined by 

0„(x)=infJ fe^Z^ f 0*^ = 1, fe^Xfc^x, 0>O, 0eri (7) 

[a:=1 k=l J 

for any x e W^. Here we are taking the convention that inf(0) = +oo. This function is 
well-defined because the vector Z„ exists and is unique for the sample. The estimator 
is, in fact, a polyhedral convex function (i.e., a convex function whose epigraph is a 
polyhedral; see Rockafellar (1970), page 172) and satisfies, as a consequence of Lemma 
A.1, 

(pnM = sup {V'(X)}, 

where •Xgc.Zn '^^ the collection of all convex functions i// : IR** ^ IR such that y/iXj) < Z^ 
for all 7 = 1,..., n. Thus, 0„ is the largest convex function that never exceeds the Z;^'s. 
It is immediate that 0„ is indeed a convex function (as the supremum of any family of 
convex functions is itself convex). The primal characterization of the set JTar implies 
that (f>niXj) = for all j = l,...,n. 

2.2 Finite sample properties 

In the following lemma we state some of the most important finite sample properties 
of the least squares estimator defined by (7). 
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Lemma 2.4 Let^n be the least squares estimator obtained from thesample{Xi, Yi),..., (X„, 
Then, 

n 

(i) iy/iXic) - 0„(Xfc)) (Yjt - (pniXic)) < for any convex function y/ which is finite on 
Conv{Xi,...,Xn); 

n 

(ii) ^ ^niXk)iYk - 4>niXk)) = 0; 

(Hi) f; Yk=f^(f>niXk); 
k=l fc=l 

(iv) theseton which(Pn <oo isConv{Xi,...,Xn); 

(v) for any x e IR"^ the map {Xi X„, Yi, . . . , Yn) ^ 0w(x) is a Borel-measurable func- 
tion from U"'^'^^'^'' intoU. 

Proof: Property {i) follows from Moreau's decomposition theorem, which can be stated 
as: 

Consider a closed convex set"^ on a Hilbert space J6' with inner product {-,•) andnorm \\- 
II . Then, for any x^J€ there is only one vector x-^ e'^ satisfying || x- x<^ || = argmin^^c^ { || x- 
^ II } . The vector x^ is characterized by being the only element of^ for which the inequal- 
ity - x<^, X - x^) < holds for every {see Moreau (1962) or Song and Zhengjun 
(2004)). 

Taking y/ to be k0„ and letting k vary through (0,oo) gives from (/). Similarly, 
{Hi) follows from (/) by letting y/ to be 0,, ± 1- Property (/ v) is obvious from the defini- 
tion of 0„. 

To see why ( v) holds, we first argue that the map {Xi , . . . , X„ , 7i , . . . , F„) ^ Z„ is mea- 
surable. This follows from the fact that Z„ is the solution to a convex quadratic program 
and thus can be found as a limit of sequences whose elements come from arithmetic 
operations with (Xi, . ..,X„, Yi, . . . , Y„). Examples of such sequences are the ones pro- 
duced by active set methods, e.g, see Boland (1997); or by interior-point methods (see 
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Kapoor and Vaidya (1986) or Mehrotra and Sun (1990)). The measurability of 0„(x) fol- 
lows from a similar argument, since it is the optimal value of a linear program whose so- 
lution can be obtained from arithmetic operations involving just {Xi,..., X„, Yi,...,Yn) 
and Zn (e.g., via the well-known simplex method; see Nocedal and Wright (1999), page 
372orLuenberger (1984),page30). □ 



2.3 Computation of the estimator 

Once the vector Z„ defined in (2) has been obtained, the evaluation of 0„ at a single 
point X can be carried out by solving the linear program in (7). Thus, we need to find 
a way to compute Z„. And here the dual characterization proves of vital importance, 
since it allows us to compute Z„ by solving a quadratic program. 

Lemma 2.5 Consider the positive semideflnite quadratic program 

min I^=ilFfc-z^|2 
subject to <(f k, Xj-Xk) <zj -z'' y k,j = l,...,n (8) 
^1 ^„gR^,£gK«. 

Then, this program has a unique solution Zn in z, i.e., for any two solutions (^i, . . . , z) 
and (ti,...,t„,0 we have z - ( - Z^. This solution Zn is the only vector in IR" which 
satisfies (2). 

Proof: From Lemma 2.2 if (^i, . . . , (f „, z) belongs in the feasible set of this program, then 
z e JTa;. Moreover, for any z e J6gc there are ^i,...,^„ g IR^ such that (^i,...,^„,z) be- 
longs to the feasible set of the quadratic program. Since the objective function only 
depends on z, solving the quadratic program is the same as getting the element of 
which is the closest to Y . This element is, of course, the uniquely defined Z„ satisfying 
(2). □ 
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The quadratic program (8) is positive semidefinite. This implies certain computational 
complexities, but most modern nonlinear programming solvers can handle this type of 
optimization problems. Some examples of high-performance quadratic programming 
solvers are CPLEX, LINDO, 

MOSEK and QPOPT. Here we present two simulated examples to illustrate the compu- 




Figure 1: The scatter plot and nonparametric least squares estimator of the convex regression function 
when (a) <p{x) - |xp (left panel); (b) cp{x) - -x^ + x^ (right panel). 

tation of the estimator when d = 2. The first one, depicted in Figure la corresponds to 
the case where 0(x) = \x\^. Figure lb shows the convex function estimator when the re- 
gression function is the hyperplane <p{x) = -x^ + x^. In both cases, n - 256 observations 
were used and the errors were assumed to be i.i.d. from the standard normal distribu- 
tion. All the computations were carried out using the MOSEK optimization toolbox for 
Matlab and the run time for each example was less than 2 minutes in a standard desk- 
top PC. We refer the reader to Kuosmanen (2008) for additional numerical examples 
(although the examples there are for the estimation of concave, componentwise non- 
decreasing functions, the computational complexities are the same). 



12 



2.4 The componentwise nonincreasing case 

We now consider the case where the regression function cp is assumed to be convex and 
componentwise nonincreasing. The developments here are quite similar to those in the 
convex case, so we omit some of the details. Given the observed values {Xi , Yi) , . . . , iX„, ¥„) , 
we write for the collection of all vectors zeR" for which there is a convex, compo- 
nentwise nonincreasing function y/ satisfying ij/{Xj) = z^ for every j = \,...,n. We will 
denote by IR+ and WL, respectively, the nonnegative and nonpositive orthants of IR"^. We 
now have the following characterizations. 

Lemma 2.6 Let 2 £ K". Then, z g S-gc if and only if the following holds for every j = 
\,...,n: 

z^' = inf J 0*^2*^ : 0^ = 1, d +Y.e^Xk = Xj, d >Q, 6 ,d . 

\k=\ k=l k=l J 

Proof: The proof is very similar to that of Lemma 2.L The difference being that we use 
Lemma A.2 and the function 

h{x) = mi\ Y^e^z'^-.Y.e^ = \, -9+ f;0^Xfc = x, 0>O, 0GK",-9eRf i 

[k=l k=l k=\ J 

instead of using Lemma A. 1 and the function g. □ 

The analogous dual characterization here is given in the following lemma. Its proof 
is just an application of the duality theorem of linear programming, so we omit it. 

Lemma 2.7 Let z g IR". Then, z g if and only if for every j = \,...,n we have 

z^ = su^[{^,Xj)+T]:{^,X^)+r]<z^y k=\,...,n, ^G^l r^G^}. 

Moreover, z g .Sa; if and only if there exist vectors ^1, . . . , ^„ g IR^ such that 

{^j, Xk-Xj)<z^-z^ y k,ie{\,...,n}. 
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Just as in the previous case, we can use both characterizations to show the existence 
and uniqueness of the vector 

W„ = argmin^ ^ 

and then define the nonparametric least squares estimator by 

In n n 

lfc=i k=i fc=i 

Here, the vector Wn can also be computed by solving the corresponding quadratic pro- 
gram 

min Ll^,\Yk-z''\^ 
subject to k> Xj - Xic) <zj-z'^ y k,j = l,...,n 

^i,...,e„GKlzGr\ 

which differs from the program (8) just because here the ^j's have to be nonpositive. 
The estimator enjoys analogous finite dimensional properties to those listed in Lemma 
2.4. For the sake of completeness, we include them in the following lemma. 

Lemma 2.8 Let(pn be the convex, componentwise nonincreasing least squares estimator 
obtained from the sample (Xi , Yi) , . . . , Yn) . Then, 

n 

(i) {,y/ixic)-(pniXic)){Yic-(pn{Xic)) < for any convex, componentwise nonincreasing 
function^ which isflniteon ConviXi,...,X„); 

(ii) Y^(pniXk)iYk-(pniXk))^0; 

k=l 
n n 

(Hi) Y^^k^Y. VniXk); 

k=l k=l 

(iv) the set on which ^„ < oo is Conv {Xi X„) + K^; 

(v) for any xeU.'^ the map (Xi, . . . , X„, Yi, . . . , Y„) ^ ^„(x) is a Borel-measurable fonc- 
tionfromU"'''^^'^^ intoU. 



Yk- 
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3 Consistency of the least squares estimator 

The main goal of this paper is to show that in an appropriate setting the nonparamet- 
ric least squares estimator 0„ described above is consistent for estimating the convex 
function on the set X. In this context, we will prove the consistency of (pn in both, 
fixed and stochastic design regression settings. 

Before proceeding any further we would like to introduce some notation. For any 
Borel set j£ c IR"^ we will denote by S§x the a-algebra of Borel subsets of X. Given a 
sequence of events we will be using the notation [An i.o.] and [An a.a.] to denote 

limA„ and \im An, respectively. 

Now, consider a convex function f : ^ U. This function is said to be proper if 
fix) > -oo for every x g IR^. The effective domain of /, denoted by Dom(/), is the set 
of points X e IR*^ for which /(x) < oo. The subdifferential of / at a point x g IR"^ is the set 
5/(x) c IR^ of all vectors ^ satisfying the inequality 

{^,h)<f{x+h)-f{x) yheW^. 

The elements of 3/(x) are called subgradients of / at x (see Rockafellar (1970)). For a 
set A(zW^ we denote hy A°, A and dA its interior, closure and boundary, respectively. 
We write ExtiA) = IR'^ \ A for the exterior of the set A and diam(yl) := sup^ |x - y| for 
the diameter of A. We also use the sup-norm notation, i.e., for a function g : IR"^ ^ IR we 
write ||g|U = sup^g^|g(x)|. 

To avoid measurability issues regarding some sets, specially those involving the ran- 
dom set-valued functions {d(f)nix)}x£X°> we will use the symbols P* and P* to denote 
inner and outer probabilities, respectively. We refer the reader to Van der Vaart and 
Wellner (1996), pages 6-15, for the basic properties of inner and outer probabilities. 
In this context, a sequence of (not necessarily measurable) functions (^„)^^j from a 
probability space (n,=^,P) into U is said to converge to a function W almost surely (see 
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Van der Vaart and Wellner (1996), Definition 1.9.1-(iv), page 52), written ^ if 
P* C^n ^ ^) = 1. We will use the standard notation P{A) for the probabilities of all 
events A whose measurability can be easily inferred from the measurability of the ran- 
dom variables {(f)nix)}x£X' established in Lemma 2.4. 

Our main theorems hold for both, fixed and stochastic design schemes, and the 
proofs are very similar. They differ only in minor steps. Therefore, for the sake of sim- 
plicity, we will denote the observed values of the regressor variables always with the 
capital letters X„. For any Borel set X c K^, we write 

iV„(X) = #{l< j<n:Xjel}. 

The quantities X„ and N„ (X) are non-random under the fixed design but random under 
the stochastic one. 

3.1 Fixed Design 

In a "fixed design" regression setting we assume that the regressor values are non-random 
and that all the uncertainty in the model comes from the response variable. We will now 
list a set of assumptions for this type of design. The one-dimensional case has been 
proven, under different regularity conditions, in Hanson and Pledger (1976). 

(Al) We assume that we have a sequence (X„, Yn)"^^^ satisfying 

Yk^(PiXk)+ek 

where (c«)^i is an i.i.d. sequence with E [cj] -0,E j = <oo and : K'^ ^ IR is 
a proper convex function. 

(A2) The non-random sequence (Xn)^i is contained in a closed, convex set X c K'^ 
with X° 9^ and X c Dom(0). 

(A3) We assume the existence of a Borel measure v on X satisfying: 
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(i) {X e : v(X) = 0} = {X e : X has Lebesgue measure 0}. 

(ii) j^NnCQ v(X) for any open rectangle X c X°. 

Condition (Al) maybe replaced by the following: 
(A4) We assume that we have a sequence i^n)^i satisfying 

Yk = (PiXk)+ek 

where : IR^ — IR is a proper convex function and is an independent se- 

quence of random variables satisfying 

(i) E (e„) = V n e N and Um ^ L«^^ E {\ek\) > 0. 

(iii) sup„e^{E(c2)}<oo. 

Under these conditions we define := lim„_oo ^ ^ [ejj. 

The raison d'etre of condition (A4) is to allow the variance of the error terms to depend 
on the regressors. We make the distinction between (Al) and (A4) because in the case 
of i.i.d. errors it is enough to require a finite second moment to ensure consistency. 

3.2 Stochastic Design 

In this setting we assume that i^n)^i is an i.i.d. sequence from some Borel prob- 
ability measure ju on D?^"*"^. Here we make the following assumptions on the measure 

(A5) There is a closed, convex set 3£ c K'^ with X° ^0 such that ju(3£ x IR) = 1. Also, 

I y^nidx, dy) < oo. 
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(A6) There is a proper convex function (f):R^R with X c Dom(0) such that when- 
ever {X, 7) ~ ^ we have E{Y - (p{X) \X) = and E (| 7 - (p{X) \^) ^a^<oo. Thus, (p 
is the regression function. 

(A7) Denoting by v(-) = /j((0 x R) the x-marginal of )U, we assume that 



We wish to point out some conclusions that one can draw from these assumptions. 
Consider the class of functions 



so we get that (p is in fact the element of JT^ which is the closest to Y in the Hilbert space 
(X X [R, ^xx K, jU) . This follows from Moreau's decomposition theorem (see the proof of 
Lemma 2.4). 

Additionally, conditions {A5-A7} allow for stochastic dependency between the error 
variable Y - (p{X) and the regressor X. Although some level of dependency can be put 
to satisfy conditions {A2-A4}, the measure /i allows us to take into account some cases 
which wouldn't fit in the fixed design setting (even by conditioning on the regressors). 

3.3 Main results 

We can now state the two main results of this paper. The first result shows that assuming 
only the convexity of (p, the least squares estimator can be used to consistently estimate 
both (p and its subdifferentials d(p{x). 

Theorem 3. 1 Under any of{Al -A3}, {A2-A4} or {A5-A 7} we have, 



{X e : v(X) = 0} = {X G 5gx : X has Lebesgue measure 0}. 




Then for any X c j£ the following holds 




y/{x){y - (pix))iJ.idx,dy) = Vi// eJf^; 
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(i) P sup{|(^„(x) for any compact set X X° \ = 1. 

KxeX 



(ii) For every x€X° and every f e 



nd 



7^,. (pnix+hO-(pnix) (piX + - (pix) . , 

lim hm ; < lim ; almost surely. 

n^oohiO h hlO h 

(Hi) Denoting by B the unit ball (w.r.t. the Euclidian norm) we have 

P* (a0„ (x) c dipix) + eB a.a.) = 1 V e: > 0, V x e X°. 

(iv) If(p is differentiable atx£ X°, then 

sup {|^-V0(x)|}-^O. 

Our second result states that assuming differentiability of on the entire X° allows us 
to use the subdifferentials of the least squares estimator to consistently estimate Vcp 
uniformly on compact subsets of 3C°. 

Theorem 3.2 If(p is differentiable on X°, then under any of{Al-A3}, {A2-A4} or {A5-A7} 
we have, 



P* 



sup {|^- V0(x)|} ^ for any compact set X X° 

xeX 



= 1. 



3.4 Proof of the mam results 

Before embarking on the proofs, one must notice that there are some statements which 
hold true under any of {A1-A3}, {A2-A4} or {A5-A7}. We list the most important ones 
below, since they'll be used later. 



• For any set X c X we have 



iV„(X) as 

" -^V(X). (9) 



n 
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• The strong law of large numbers implies that for any Borel set X c X with positive 
Lebesgue measure we have 

-i- E iYk-^iXt))^0 (10) 

and also 

Ito- ^ iYk-il)iXk)f = a^Si.s. (11) 

" i<fc<« 

We would like to point out that in the case of condition A4, A4-(iii) allows us to 
obtain (10) from an application of a version of the strong law of large number for 
uncorrected random variables, as it appears in Chung (2001), page 108, Theorem 
5.1.2. Similarly, condition A4- (ii) implies that we can apply a version the strong law 
of large numbers for independent random variables as in Williams (1991), Lemma 
12.8, page 118 or in FoUand (1999), Theorem 10.12, page 322 to obtain (11). 

• For any Borel subset X c X with positive Lebesgue measure, 

#{n e N : X„ e X} ^ +oo (12) 

Proof of Theorem 3.1. We will only make distinctions among the design schemes in 
the proof if we are using any property besides (9), (10), (11) or (12). Forthe sake of clar- 
ity we divide the proof in steps. 

Step I: We start by showing that for any set with positive Lebesgue measure there is a 
uniform band around the regression function (over that set) such that 0„ comes within 
the band at least at one point for all but finitely many n's. This fact is stated in the 
following lemma (proved in Section 4.1). 
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Lemma 3.1 For any setX(zX with positive Lebesgue measure we have, 
pfinf {|0„(x) -0(x)|} > Mi.o.l = V M > -^=. 

Step II: The idea is now to use the convexity of both, cp and 0„, to show that the previous 
resuh in fact implies that the sup-norm of ^„ is uniformly bounded on compact subsets 
of j£°. We achieve this goal in the following two lemmas (whose proofs are given in 
Sections 4.2 and 4.3 respectively). 

Lemma 3.2 LetXcX° be compact with positive Lebesgue measure. Then, there is a pos- 
itive real number Kx such that 



•finf{0„(x)} 



<-Kx i.o. =0. 



Lemma 3.3 Let J c X° be a compact set with positive Lebesgue measure. Then, there is 
Kx>0 such that 



P sup{0„(x)}>^^ i.o. =0. 

\xeX J 

Step III: Convex functions are determined by their sub differential mappings (see Rock- 
afellar (1970), Theorem 24.9, page 239). Moreover, having a uniform upper bound Kx 
for the norms of all the subgradients over a compact region X imposes a Lipschitz con- 
tinuity condition on the convex function over X (see Rockafellar (1970), Theorem 24.7, 
page 237); the Lipschitz constant being K^. For these reasons, it is important to have a 
uniform upper bound on the norms of the subgradients of (pn on compact regions. The 
following lemma (proved in Section 4.4) states that this can be achieved. 

Lemma 3.4 Let J c X° be a compact set with positive Lebesgue measure. Then, there is 
Kx>0 such that 



sup {\^\}>Kxi.o. 



£,ed<p„[x) 
V xeX 



0. 
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Step IV: For the next results we need to introduce some further notation. We will denote 
by /i„ the empirical measure defined on IR'^'^^ by the sample {Xi, Yi),...,{X„, Y„). In 
agreement with Van der Vaart and Wellner (1996), given a class of functions on D a 
U.'^^^, a seminorm |M| on some space containing and e > we denote by N{e,'^, || • ||) 
the e covering number of with respect to || • || . 

Although Lemmas 3.5 and 3.7 may seem unrelated to what has been done so far, 
they are crucial for the further developments. Lemma 3.5 (proved in Section 4.5) shows 
that the class of convex functions is not very complex in terms of entropy. Lemma 3.7 is 
a uniform version of the strong law of large numbers which proves vital in the proof of 
Lemma 3.8. 

Lemma 3.5 Let X(zX° bea compact rectangle with positive Lebesgue measure. For K>0 
consider the class "^kj of all functions of the form i/a(X) (7 - 0(X)) 1^(X) where y/ ranges 
over the class 2!k,x of all proper convex functions which satisfy 

(a) \mx<K; 

(b) U 

(edy/ix) 
xeX 

Then, for any e > we have 

lim N{e,'^K xX\{.X^ IR./^n)) < oo almost surely, 

and there is a positive constant Ae < oo, depending only on (Xi, . . . , X„), K and X, such 
that the covering numbers N{^Y.".^-^\Yj - (p{Xj)\,'^K,xXi{.Xx IR,ju„)) are bounded above 
by Ae, for all n g almost surely. 

The proofs of Lemmas 3.7 and 3.8 (given in Sections 4.7 and 4.8 respectively) are the 
only parts in the whole proof where we must treat the different design schemes sepa- 
rately. To make the argument work, a small lemma (proved in Section 4.6) for the set of 
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conditions {A2-A4} is required. We include it here for the sake of completeness and to 
point out the difference between the schemes. 

Lemma 3.6 Consider the set of conditions {A2-A4} and a subsequence {n^)^^^ such that 

1 /- \ 
lim— VE ^2 =0-2. 

k^oon^cpi ^ ^' 

Let {Xm)^^i be a an increasing sequence of compact subsets ofX satisfying viXm) 1- 
Then, 

lim lim — Y Efe^l =(rl 

k^oo "■k {l<j<nk.Xj£X,n] 

We are now ready to state the key result on the uniform law of large numbers. 

Lemma 3.7 Consider the notation of Lemma 3.5 and let J c X° be any finite union of 
compact rectangles with positive Lebesgue measure. Then, 

1 



sup - 



^ y/iXj^Yj-cpiXj)) 



" {l<j<n:XjeX\ 



^0. 



Step V: With the aid of all the results proved up to this point, it is now possible to show 
that Lemma 3.1 is in fact true if we replace M by an arbitrarily small 77 > 0. The proof of 
the following lemma is given in Section 4.8. 

Lemma 3.8 LetXcX° be any compact set with positive Lebesgue measure. Then, 
(i) P finf{0(x) - 0„(x)} > 77 i.0.1 = V 77 > 0, 



(ii) P sup{0(x)-0„(x)}<-r7 i.o. = V77>0. 

VxeX J 

Step VI: Combining the last lemma with the fact that we have a uniform bound on the 
norms of the subgradients on compacts, we can state and prove the consistency result 
on compacts. This is done in the next lemma (proof included in Section 4.9). 

Lemma 3.9 LetXcX° be a compact set with positive Lebesgue measure. Then, 
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(i) P finf{0„(x) - 0(x)} < -T] i.o.] = V > 0, 



(ii) P 



sup{(/)„(x)-0(x)} >?; i.o. I = \/ri>0, 



t/h;; sup{|(^„(x) - 0(x)|} 0. 

Step VII: We can now complete the proof of Theorem 3.1. Consider the class £ of all 
open rectangles ^ such that i% c X° and whose vertices have rational coordinates. 
Then, € is countable and U^gec^ = Observe that Lemmas 3.2 and 3.3 imply that 
for any finite union A := u ■ ■ ■ u of open rectangles ^i, . . . , £ <i there is, with 
probability one, no £ N such that the sequence is finite on Conv{A). From 

Lemma 3.9 we know that the least squares estimator converges at all rational points in 
X° with probability one. Then, Theorem 10.8, page 90 of Rockafellar (1970) implies that 
(/) holds if X° is replaced by the convex hull of a finite union of rectangles belonging 
to Since there are countably many of such unions and any compact subset of X° 
is contained in one of those unions, we see that (/) holds. An application of Theorem 
24.5, page 233 of Rockafellar (1970) on an open rectangle C containing x and satisfying 
C c X° gives {ii) and (///). Note that (/ v) is a consequence of {Hi). □ 



Proof of Theorem 3.2. To prove the desired result we need the following lemma 
(whose proof is provided in Section 4.10) from convex analysis. The result is an ex- 
tension of Theorem 25.7, page 248 of Rockafellar (1970), and might be of independent 
interest. 

Lemma 3.10 Let czU^ he an open, convex set and f a convex function which is finite 
and differentiable on . Consider a sequence of convex functions {.fnf^^Y ^hich are finite 
on and such that fn f pointwise on ^. Then, ifXc ^ is any compact set, 

sup {|^-V/(x)|}-0. 

xeX 
(^dfnix) 
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Defining the class C of open rectangles as in the proof of Theorem 3.1, one can use a 
similar argument to obtain Theorem 3.2 from an application of Theorem 3.1 and the 
previous lemma. □ 



3.5 The componentwise nonincreasing case 

The regression function is now assumed to be convex and componentwise nonin- 
creasing. Recalling the notation defined in Section 2.4, we now have that Theorems 
3.1 and 3.2 still hold with 0„ replaced by (pn- In view of the fact that the proof of the 
results is very similar to that when cp is just convex, we omit the proof and sketch the 
main differences. The proof of the main results in Section 3 relied essentially on two 
key facts: 

(i) The finite sample properties of 0„ established in Lemma 2.4. 

(ii) The vector e is the ££2 projection of (Yi,..., F„) on the 

closed, convex cone JTa: of all evaluations of proper convex functions on (Xi 

Xn). Also, note that . . . , g J^<x- 

We know from Lemma 2.8 that (pn has similar finite sample properties as its convex 
counterpart. Note that if is convex and componentwise nonincreasing {(p{Xi),..., 
(p{Xn))' e and g K" is the ^2 projection of {Yi,...,Yn) onto 

From these considerations and the nature of the arguments used to prove Theorems 
3.1 and 3.2, it follows that all but one of those arguments carry forward to the com- 
ponentwise nonincreasing case; the only difference being the entropy calculation of 
Lemma 3.5. At some point in that proof, one breaks the rectangle [-K,K]'^ into a fam- 
ily of subrectangles in order to approximate the subdifferentials of the class @j<:,x. It is 
easily seen that the same argument holds in the componentwise nonincreasing case 
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if one instead uses a partition of [-K,Or to approach the subdifferentials of the cor- 
responding class 2>K,i for componentwise nonincreasing convex functions. By doing 
this, the resulting function g will be convex and componentwise nonincreasing and 
(30), (31) and (32) will still hold for the corresponding class ^n,£- Then, the conclu- 
sions of Lemma 3.5 are also true for the componentwise nonincreasing case and we 
can conclude that our main results are valid in this case too. 



4 Proofs of the lemmas 

Here we prove the lemmas involved in the proof of the main theorem. To prove these, 
we will need additional auxiliary results from matrix algebra and convex analysis, which 
may be of independent interest and are proved in the Appendix. 

4.1 Proof of Lemma 3.1 

We will first show that the event 

[infxex {4>nix) - (pix)} > M i.o.] has probability zero. Under this event, there is a subse- 
quence (Wfc)^j such that infjcex {(pntM - 0(x)} > M V A; e N. Then (10) implies that for 
this subsequence, with probability one, we have 

F^l^^^^^J-'f'"^^^j^^ - ^^^^ 

On the other hand, it is seen (by solving the corresponding quadratic programming 
problems; see, e.g.. Exercise 16.2, page 484 of Nocedal and Wright (1999)) that for any 
r/ > 0, m G N 

infl- I^'l':- E ^^■^^.^£^"1 = rf, (14) 

I ^ l<j<m ^ l<j<m J 

infl- ^ E ^j<-rj,^eRA = rf. (15) 

I ^ l<7<m ^ l<;<ni J 
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For < 5 < M, using (15) with r] - M-6 together with (12) and (13) we get that, with 
probability one, we must have 

Um—Y,iYj-(Pn,iXj)f>v(X){M-6f. 

k^oo j=i 

Letting 5 ^ we actually get 

lim — ^ {Yj - ^n,iXj)f > v(X)M2 > = lim — ^ {Yj - (p{Xj)f a.s. 

fc-OO ^k y = l rile y = l 

which is impossible because (pn^. is the least squares estimator. Therefore, 

P (inf {0„(x) - (p{x)} > M i.o.) = 0. 
A similar argument now using (14) gives 

P sup{0„(x) -0(x)} < -Mi.o.] = 0, 
which completes the proof of the lemma. □ 

Before we prove Lemmas 3.2 and 3.3, we need some additional results from matrix 
algebra. For convenience, we state them here, but postpone their proofs to Section A.2 
in the Appendix. 

We first introduce some notation. We write e^- £ for the vector whose components 
are given by = djk, where djk is the Kronecker 6. We also write e = ei + . . . + for the 
vector of ones in W^. For a e {-1, 1}^ we write 

for the orthant in the a direction. For any hyperplane ^ defined by the normal vector 
^ £ IR'^ and the intercept £ IR, we write ^={xeU'^ : x> = b}, ^+ = {x £ : x> > 
b} and = {x £ IR"^ : x> < b}. For r > and Xq £ we will write B(xo, r) = {x £ IR'^ : 
|x - xol < r}. We denote by IR'^'"^ the space of dxd matrices endowed with the topology 
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defined by the || • II2 norm (where ||yl||2 = sup|^|<i{|Ax|} and can be shown to be equal to 
the largest singular value of A; see Harville (2008)). 

Lemma 4.1 Let r > 0. There is a constant Ry > 0, depending only on r and d, such that 
for any p* g (0,i?r) there are p,p* > with the property: for any a e {-1, l}*^ and any 
d-tuple of vectors /3 = {x\,...,Xd} <= IR^ such thatxj e B{ajrej,p) V j = l,...,d, there is 
a unique pair i^a,p> ba,p), with ^a,p ^ \^a,p\ = 1 f^nd ba,p > for which the following 
statements hold: 

(i) p form a basis for W^. 

(ii) Xx,...,Xd^^a,p'-^ {XGlR^: <^a,^,X> = ba,p\. 



where X^g = (xi, . . . , x^) (^w-^ is the matrix whose j 'th column is xj. 

Figure 2a illustrates the above lemma when d-2 and a = (1, 1). The lemma states that 
whatever points Xi and X2 are taken inside the circles of radius p around a^rei and 
a^re2, respectively, B(0,p*) and {x g IR^ : |x| > p*} are contained, respectively, in 
the half-spaces and Assertion {vii) of the lemma implies that all the points 
in the half line {wi + t{w2-Wi}t>\ should have positive co-ordinates with respect to the 
basis /3 as they do with respect to the basis {a^ej}^^^. We refer the reader to Section 
A.2. 1 for a complete proof of Lemma 4. 1 . 



(m min{|^; |}>0. 
(iv) B(0,pjc^-^. 
(V) {xGK^:|x|>p*}n^aC^+^. 

(vi) B{-ahej,p) c {x g [R^ : {S,a,p,x) < 

(vii) For any WicB (O, and Wz^ B 
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Figure 2: Explanatory diagram for (a) Lemma 4.1 (left panel); (b) Lemma 4.2 (right panel). 

We now state two other useful results, namely Lemma 4.2 and Lemma 4.3, but defer 
their proofs to Section A.2.2 and Section A.2.3 respectively. 

Lemma 4.2 Let r > and consider the notation of Lemma 4.1 with the positive num- 
bers p, and p* as defined there. Take 2d vectors {x+i, . . . , x+d) c such that x+j e 
B{±rej,p) andfora e {-1,1}'^ write Pa = {Xa^v^a^i'-'-'^u'id^'^a ^ U,Pa' = ba,p„ and 
J€a = ■Xa,f,y all in a^cementwith the Setting of Lemma 4.1. Then, ifK = Conv{x+i,...,x+a) 
we have: 

(i) K = nae{-i,i]d{x G : iU, X) < ba}. 

(ii) K° = nae{-i,i(^{XG : {^a,X) < ba}. 
(Hi) dK - \Jae{-l,l]'' ConV [Xaii, ■ ■ ■ , X^dd) ■ 

(iv) dK= (u„£j_i,i)d!xeK^ : {(a.x) = fc„l)n(n„^{_i_i|d{xe R'* : «a,jc> < ba]] ■ 

(V) B(0,pjc^°. 
(vi) dB{,0,p*)czExtiK). 
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Figure 2b illustrates Lemma 4.2 for the two-dimensional case. Intuitively, the idea is 
that as long as the points x+i and x+2 belong to 5(±rei, p) and B(+re2, p), respectively, 
we will have B(0, p*) and dB{0, p*) as subsets of K° and ExtCiC), respectively. 




Figure 3: Explanatory diagram for (a) Lemma 4.3 (left panel); (b) Lemma 3.2 (right panel). 

Lemma 4.3 Let [a, b] (zW^ be a compact rectangle and r > 0, with r < -jz2 ^fd ^ 3. For 
each a G {-1, l}"^ write Za = a + [b^ - cl^] so that {Za} ae{-i,i]'^ the set of 

vertices of [a, b]. Then, there is p > such that if Xa g B{Za + r{Za- z-a),p) V a g {-1, l}"^, 
then 

[a,fo] c Conf |xa; : a G . 

Figure 3a describes Lemma 4.3 in the two-dimensional case. As long as the points 

■^(±i,±i) are chosen in the balls of radius p around Z(+i,+i] + r(Z(±i,±i)-Z(+i,+i)), Conv[x[+i^+i)] 
will contain Conf 

4.2 Proof of Lemma 3.2 

Since any compact subset of j£° is contained in a finite union of compact rectangles, 
it is enough to prove the result when X is a compact rectangle [a,b] c X°. Let r = 
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|mini</t<£^{fo* - a^} and choose p g (0, |r), p* > and < p* < such that the con- 
clusions of Lemmas 4.1 and 4.2 hold for any ae{-l, l}^ and any /3 - (zi,...,^^) e IR'^'"^ 
with Zj eBia-i rej , p) . Take NeN such that 



— max {b'^ - a'^} < ^— p. 



(16) 



and divide X into rectangles all of which are geometrically identical to jj[0,b- a]. 
Let be any one of the rectangles in the grid and choose any vertex zq of satisfying 

zo = argmax-^ max \ z^ - , - zH\. 
Then, from the definition of zq and r, there is ao g {-1, 1}"^ such that 



5(zo,r)n(zo + ^ao) cX. 



Additionally, define 



B 



Bz 
A 



B 



B 



Zq, 



Zq + 



3p, 



-.ao, 



P* \ 



A. 



J 



J 



Bizo + alrej,p) n (zo + ^ao) \/ j ^l,...,d, 
Bizo-al^rej,p) yj = l,...,d. 



Observe that all the sets in the previous display have positive Lebesgue measure and 
that the yl_j's are not necessarily contained in X. Let Mi = ||0||j^,Mo > 



V'min{y(Bi),v(B2),v(Ai),...,v(Arf)}' 

M- Mi + Mo and iC^ > 6M. Also, notice that '^^ c because of (16). We will argue that 



inf{0„(x)}<-r<^i.o. =0. 



(17) 



From Lemma 3.1, we know that 



P n inf {|0„(x)-0(x)|}<Moa.a. 



1, 



(18) 
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so there is, with probability one, no £ N such that infxeAj {\4>nM - <p{x)\} < Mq for any 
n> no and any j = l,...,d. 

Assume that the event [infjce<g{0„(x)} < -Keg i.o.] is true. Then, there is a subse- 
quence nic such that infxe<^{0„j.(x)} < -K^g for all A; g N. Fix any k > uq. We know that 
there is X* g 'i^ c such that < -K^. In addition, for j = l,...,d, there are 

Z j G Aj such that ) - 0(Z )| < Mq, which in turn implies (f>nkiZ j ) < M. 

Pick any Z j g A- ,■ and let K = Conv{Z+i, Z+d) = zo + Conv{Z+i - Zq,..., Z+d - Zq). 

Take any xe Bz. We will show the existence of X* e Conv [-^aii' • • • - ■^a'^d] ^'^^h that 
xe Conv (X* , X* ) , as shown in Figure 3b for the case d = 2. We will then show that the 
existence of such an X* implies that 



\(pix)-(f)ntix)\> Mo. 



(19) 



Consequently, since x is an arbitrary element of B2 we will have 



inf {0„(x)} < -K<g i.o. 



n n inf {|0„(x)-0(x)|}<Moa.a. 



inf {|0(x) -0„^.(x)|} > Mo i.o. 

xeBz 

But from Lemma 3.1, the event on the right is a null set. Taking (18) into account, we 
will see that (17) holds and then complete the argument by taking ~ max<^{^r<^}. 

To show the existence of X* consider the function 1// : [R ^ IR'^ given by %f[t) = X* + 
t{x - X*). The function 1//^ is clearly continuous and satisfies i//(0) = X* and ■^r{\) = x g 
B2 <= K° . That B2 c iC" is a consequence of Lemma 4.1, {iv). The set K is bounded, so 
there is T > 1 such that i/A(r) g Ext(i(r) = U^\K. The intermediate value theorem then 
implies that there is t* e (1, T) such that X* := y/it*) e dK. Observe that by Lemma 4.2 
iiii) we have 

dK= y Conv[Z^i^,...,Z^dd]. 
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Lemma 4. 1 (/) implies that iZ^ii - Zq,..., Z^d^ - Zq} forms a basis of K"^ so we can write 
X* - zo = T.i=idHz j . - Zq). Moreover, Lemma 4.1 {vii) implies that 0^ > for every 
j -l,...,da.s6 - = {Z^\'y-ZQ,...,Z^d^-ZQ)~^{X* -Zq). Here we apply Lemma 

4.1 {vii) with if i = X* g B\, W2 = xe B2 and t* > 1. 

For a G {-1, 1}"^ consider the pair (^a, ba) g K'^ x [R as defined in Lemma 4.2 for the set 
ofvectors {Z+i-zo,...,Z+d-Zo} (here we move the origin to Zq). Observe that Lemma 4.1 
iii) implies that {i,a^,Z j .- zq) = ba^ for all j - l,...,d. Consequently, {^ao'^* ~ ■^o) - 
baoYf^j^iO^ > but since X* g dK, Lemma 4.2 [iv) implies that {£,aQ,X* - Zq) < bag and 
hence Y.'j^i 6^ <l. Additionally, for a 7^ ao we can write {^a,X* - zq) as 



Y^Bj{U,Zi.-Zo)= Y. d^ba+ E ej{U,Zi.-Zo)<ba 



(20) 



as {^a,Z^i - Zq) = ba (by Lemma 4.1 {ii)) and {^ayZ_^^i - zo> < (by Lemma 4.1 {vi)) 
for every 7 = \,...,d. Since {i,a,u)-ZQ) = ba for all w g Conv[Zg^ii,...,Z^da) and all 
a G {-1.1}"^, (20) and the fact that X* e ai^: imply that X* e Conui^Z^i^, Z^d^. Hence 
0„(X*) < if^ ej(p„^{Z J .) < M. We therefore have 

0„,(X*)<M , ^n,{X.)<-K<s, (21) 

X, + — (X*-XJ = X. (22) 
t* 



Since X* g Bi and d > 1 we have 



1 

|zo-X*|<-p*. (23) 
8 



By using the triangle inequality we get the following bounds 

1 1 

-p* < |Zo--^l < -P*- (24) 
4 2 

And from Lemma 4.1 {i v) and the fact that <(f . ^* ) = ^ao ^1^° obtain 

|zo-^*l>P*- (25) 
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From (22) we know that t* = '^J"^]' . Using the triangle inequality with (23), (24) and 
(25) one can find lower and upper bounds for I (as \X*-X*\ > \X* -zo\-\zo-X*\) 

and |x - X* I (as |x - X* | < \x- zo\ + \zo- X* |), respectively, to obtain t* > |. Then, (21) 
and (22) imply 



0«fc(x) 



1 - ^) + ^KiX*) <-^K<^ + ^-M< -M. 



Consequently, 

I0(x)-0„,(x)| >M-Mi = Mo. 
This proves (19) and completes the proof. 



□ 



4.3 Proof of Lemma 3.3 

Assume without loss of generality that X is a compact rectangle. Let {Za : a g {-1, l}'^} 
be the set of vertices of the rectangle. Then, there is r e (0, 1) such that B{Za, r) c X° V 
a G {-1, 1}"^. Recall that from Lemma 4.3, there is < p < such that for any {rja : a £ 
{-1, 1}^} ifr]a £ BiZa + ^iza- Z-a),p) then T^Conv[r]a:ae {-1, 1}^). 



Let Aa ^B{Za + \r{Za- z-a), 7) and Mq > 



\/ m\n{v[Aay.ae{-\,\}'>\ 



and choose 



Mi= sup {\(p{x)\}. 



Take K^> Mq + Mi. Since 



n 

Vael-1,11'' 



inf {|0„(x) -0(x)|} < Mo, a.a. 

xeAa 



by Lemma 3. 1, there is, with probability one, £ IM such that for any n > we can find 
r]a eAa,ae {-1,1}^, such that |0„(T7a) -^(TyJI < Mq. It follows that 0„(77a) < iCx V a £ 
{-1, 1}*^. Now, using Lemma 4.3 we have X c Conv[ria : a £ {-1, l}"^) and the convexity 
of (pn implies that 0„ (x) < Kx for any x £ X. □ 
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4.4 Proof of Lemma 3.4 



Assume that X = [a, b] is a rectangle with vertices {Za : a g {- 1, l}*^}. The function = 
inf^gg^^^{|x-77|} is continuous on IR*^ so there is e dX such that y/ix*) - infj^eaxl^ Wl- 
Observe that y/ix^,) > because x* g 5X c X°. By Lemma 4.3, there is a r < |i/a(x*) for 
which there exists p < \ r such that whenever rja^ Aai- B^Za + [ \Ta-z-l\ ] ' p] 
a G {-1,1}^ and 



iiT^ = Conf Za + -r\ 

V 2 v|Zq:~ 2-0:1 

Kr^ = Conv^Tja: ae{-l,l}'' 



aG{-Ll}^ 



we have 



XczKzcK°czKrjCzX°. (26) 



Let Mo > , and Mi g [R be such that 

Vmin{vUa:):ae{-l,l!''! 



p(inf{^„(x)}<-Moi.o.] =0 and Mi = sup {0(x)}. 

VxeX / / 1 



FromLemmas 3.1 and3.2 we can find, with probability one, g N such that infj^exl^wW} > 
-Mo and infxeAa^l^n M - 0(x) |} < Mo for any n > Hq. Define 

M = Ml + Mo 

4\b-a\ 

Kx = -. ^M 

rmmi<j<d{bJ - aJ} 

and take any n > Hq. Then, for any a g {-1, l}"^ we can find rja g Aa such that {(pniija) - 
(piria) \ < Mo. Then, (26) implies that 0„(x) < M Vx g X. Take then x g X and ^ g dcf)„ix). 
A connectedness argument, like the one used in the proof of Lemma 3.2, implies that 

. rmini<;<rf{Z;^-fl^} 

there is > such that x+ e oKr). But then we must have > 2\i\\b-a\ ^ 

consequence of (26), since the smallest distance between dKz and dX is '^"^^'^^2\b-a\ — ~~ 
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Figure 4: The smallest distance between dKz and dX is at least '^"'"^2|fr^l|' — ~- 

and dKrj <= ExtiKz). This can be seen by taking a look at Figure 4, which shows the situ- 
ation in the two dimensional case. Thus, using the definition of subgradients, 

rmini<,<rf{fo-' - a-'} 
2\^\\b-a\ 

which in turn implies |^| < i^Tx. We have therefore shown that, with probability one, we 
can find % g l\l such that |^| < i^Tx V ^ g 50„(x), V x g X, V n > /Iq. This completes the 
proof. □ 

4.5 Proof of Lemma 3.5 

The result is obvious for conditions {A1-A3} and {A5-A7} when = 0. So we assume that 
> for {A1-A3} and {A5-A7}. Let e > and M = sup^extl-^l^- Choose 5>0 satisfying 

< 5 < (27) 



for n large. Notice that 5 is well-defined and the quantity on the left is positive, finite 
and bounded away from as lim ^ T.'J^i I Yj - <piXj) \ > a.s. under any set of regular- 
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ity conditions (for {A2-A4}, conditions A4-(i) and A4-(iii) imply that we can apply the 
version of the strong law of large number for uncorrelated random variables, as it ap- 
pears in Chung (2001), page 108, Theorem 5.1.2 to the sequence QejDJl^', for {A1-A3} 
and {A5-A7} this is immediate as cr^ > 0) . The definition of the class &k,x implies that all 
its members are Lipschitz functions with Lipschitz constant bounded by K\/d, a con- 
sequence of Rockafellar (1970), Theorem 24.7, page 237. Hence, (27) implies that 



sup {|i/a(x) -i/A(y)|} 



< 



x,yeX,y/eSifc,x 

Now, define iV„ g N by iV„ = diamm ^ 2K^[d ^ ^here [-1 denotes the ceiling function 
Observe that (27) implies 

( ri\ 2i2M + KVd + l) 1 1 " 
iV„ - 1 < [diam(X) v ZKVdj - Z I ^; " <P'-^j 



(28) 



Then, we can divide the rectangles X and [-K,K]^ in sub rectangles, all of which 
have diameters less than 5. In other words, we can write 

[-K,K]'^ = U Rj 
X = \J Vj 

with diam(i?y) < 6 and diam(yj) < 5 V j = I,... N^. In the same way, we can divide the 
interval [-K,K] in iV„ subintervals J^i , . . . , each having length less than 6. For each 
j = l,...,N^,\et^j and Xj be the centroids of Rj and Vj respectively and for 7 = 1, . . . , iV„ 
let rjj be the midpoint of J'j. Consider the class of functions J^n.e defined by 

^„,e = | max {{^s,--Xt) + Vj}:^^{l,...,N^fx{l,...,NA. 



,j2d+l 



Observe that the number of elements in the class J£'n,e is bounded from above by 2^" . 
Now, take any y/ e Qik^h- Pick any 'Ej g dif/{Xj). Then, for any j such that Xj e X, there 
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are sjjj e {!,..., N^} and r j e {!,..., Nn) such that \Ej-^sj\> \Xj - Xt- \ and |i/A(Xf .) - r]j. \ 
are all less than 5. We then have that 



sup] {£,Sj,x-Xtj) + r]j.-[{'Ej,x-Xj) + if/{Xj)] \ 
< 2M\^sj-^j\ + Ks/d\Xtj-Xj\ + 6<{2M + KVd + l)d (29) 

by an application of the Cauchy-Schwarz inequality. But then, (27) implies that if we 
define the functions ifr and g as 

fix) = mdiX{{Ej,x-Xj) + y/iXj)}, 



g(x) = max{(^o.,x-Xf.> +77t,1 



then we have 



ijf{Xj) = y/iXj) for j such that Xj e X, (30) 
e 

g G -^^.e. (32) 



\g-f\h < (from (29)), (31) 



Note that (30) follows from the definition of subgradients. All these facts put together 
give that for any f{x, y) = y/{x) (y - <p{x)) e '^k,x> W ^ ^k,j. there is g £ ^«,e such that 

J \f{x,y) - g{x){y - (p{x))\ijin{dx,dy) <e 

and hence 

But then, the strong law of large numbers and (28) give that limN^„ < oo a.s. Further- 
more, by replacing e with \ Yj - <piXj) \ in the entire construction just made, we 
can see that the covering numbers 

^(t I"=i 1^; - (/>(^7)l,^^ic,x.lLi(X X [R,)U„)j depend neither on the Y's nor on 0. Taking 
Be = (diam(x) V jf^] 2i2M+KVd+i) ^ ^ - 2^e''^^ ig secn that thc second part of the result 

holds. □ 
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4.6 Proof of Lemma 3.6 



Note that for every m, we have 



Taking limit inferior on both sides as A; ^ oo, we get 



suplEfe^)} 



o-2<lim— X Efc2]+v(X\X^)sup{Efe2]} 



Now taking the limit as m ^ oo we get the result because the opposite inequality is 
trivial. □ 

4.7 Proof of Lemma 3.7 

We may assume that X is a compact rectangle. Here we need to make a distinction 
between the design schemes. In the case of the stochastic design, the proof is an im- 
mediate consequence of Lemma 3.5 and Theorem 2.4.3, page 123 of Van der Vaart and 
Wellner (1996). Thus, we focus on the fixed design scenario. 

For notational convenience, we write M = supyg|^{E|e^j} and Lx^ex instead of the 
more cumbersome Zi<;<«:Xjex- Letting Cj - Yj - <p{Xj) (and using the same notation 
as in the proof of Lemma 3.7) first observe that the random quantity 



sup 



= sup ^ 

me? 



sup 



n 



> . 



by (30), (31) and (32) and is thus measurable. 

All of the following arguments are valid for both, {A1-A3} and {A2-A4}. Lyapunov's 
inequality (which states that for any random variable X and l<p<q<oowe have 
||X||p < \\X\\q) and the strong law of large numbers imply 

lEi^ — y |e,|=iim— y E(\ei\)<VMa.s. (33) 
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Let 77 > 0. From Lemma 3.5 we know that the covering numbers a„ := iv(j(iy^i if, -</>(Xj)|,'^^,x,li(x x R,^ln 
are not random and uniformly bounded by a constant A^. Therefore, for any n e N we 
can find a class ^„ c @^ x with exactly a„ elements such that {y/ix) (y-0(x))}i^e^„ forms 
an I i^' " l]-net for ^Sk,! with respect to Li (X x IR, ju„). It follows that 



sup - 



l< - ^ \ej\ + sup J 
J " 1< ;<7! [ 



(34) 



With (34) in mind, we make the following definitions 



Bn 



sup 



sup < 



n 

1 

n 



X,eX 



l<;<Lv/«j2: X^eX 
1 

n^<j<k: Xj£X 

where L-J denotes the floor function. Now, pick 6 >0 and observe that 



sup 



k 2 



P{Bn>d) = P U 



E y^^Xj)ej 



Xj£X 



> n6 



The Borel-Cantelli Lemma then implies that P (B„2 > 5 i.o.) = 0. Letting 5^0 through 
a decreasing sequence gives 

(35) 



B„. ^ 0. 



On the other hand, the definition of C„ implies that 

^ "l<7<Lv/7^j2 
which together with (35) and (33) gives 



(36) 



limC„ <r]\M almost surely. 



(37) 
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Note that (36) is a consequence of the fact that for any y/ e there exists g e 
such that if ^„ = {1 < j < [s/n\ ^ : £ X}, then 



- j: mXj)-giXj))ej 



+ 



Now, a similar argument to the one used in (35) gives 



+ 



B 



n 



P(£»„>5) 



U 

E p 



E 

E 

«2<j<A;:XjeX 



>k6 



>k5 



^ K^Mik-n^) K^MA„i2n + iy 

E — — — < 



n2<A:<(„+i)2 



Again, one can use (38) and the Borel-Cantelli Lemma to prove that 

P (D„ > 6 i.o.) - and then let 5 ^ through a decreasing sequence to obtain 



Finally, one sees that 



sup 



- ^ y/[XjKYj-cl){Xj)) 



n 



Bn < Cn + D^^j, 



which combined with (37) and (39) gives 

limB„ < rj^/M almost surely. 



Taking (34) into account we get 

1 



lim sup - 



^ ^iXj)iYj-(piXj)) 



l<j<n:Xj£X 



ZrjVM almost surely. 



Letting r/ ^ we get the desired result. 
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4.8 Proof of Lemma 3.8 

We can assume, without loss of generality, that X is a finite union of compact rectangles. 
Consider a sequence (X^)^^^ satisfying the following properties: 

(a) X c X^ c X° V m e N. 

(b) v(X„) > 1 - ^ V m G N. 

(c) X^cX„+i V meN. 

(d) Every X^ can be expressed as a finite union of compact rectangles with positive 
Lebesgue measure. 

The existence of such a sequence follows from the inner regularity of Borel probability 
measures on and from the fact that since X° is open, for any compact set F c X° we 
can find a finite cover composed by compact rectangles with positive Lebesgue mea- 
sure and completely contained in X°. Also, from Lemmas 3.2, 3.3 and 3.4 and the fact 
that X c Dom(0), for any m g N we can find Km > such that 



\\(f)h^<Km and P(||0„||x„>ii:« i.o.)=0; 



f 



sup {|^|}<ii:„ 

Fix 77 > and consider the sets 



A 
B 
C 



and P* 



sup {|^|} > Km i.o. 



0. 



(40) 
(41) 



inf{0(x)-0„(x)}>?7 Lo. 

xeX 

[Il0«llx„ <^m a.a.] 



sup {\^\}<Km a.a. 



Suppose now that AnBnC is known to be true. Then, there is a subsequence ('^fc)^! 
such that inf;,ex{0(x) - ^^^(x)} > V A; £ N and ^ ij^^ E (e^j ^ ^2 taking (40) and (41) 
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into account, we have that for k large enough the inequality 
1 1 



implies 



— - — 77-4 sup 



''A: ■ -- --- 



{l<i<ni::X;eX 



Thus, from Lemma 3.7 we can conclude that 

lim— E iYj-4>„^iXj)f>viX^)a^ + viX)T]^if{Al-A3}hold. 

A;— 00 i< j<«j- 

Under {A2-A4} and {A5-A7} the left-hand side of the last display is bounded from below 

by 



lim — E iYj-il)iXj)f + viX)T]' 



k^OO ^k Xj£Xm 



and 



/ {y-(f)ix)fn{dx,dy) + viX)r]^, 

respectively. 

Finally, using (a)-(d), the strong law of large numbers (for {A2-A4} we can apply a 
version of the strong law of large numbers for independent random variables thanks to 
condition A4-(ii); see Williams (1991), Lemma 12.8, page 118 or FoUand (1999), Theo- 
rem 10.12, page 322) and Lemma 3.6 we can let m ^ 00 to see that, under any of {A1-A3}, 
{A2-A4} or {A5-A7}, 

lim— E iYj-4,n,iXj)f>a^ + viX)r]^ 

k^oo i< j<nk 

which is impossible because 0,,^. is the least squares estimator. 
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Therefore P* (A n B n C) = and, since P* (B n C) = 1, 

P(A) = Pfinf{0(x) -0„(x)} > T] Lo] = 0. 
This finishes the proof of (/). The second assertion follows from similar arguments. □ 

4.9 Proof of Lemma 3.9 

We can assume, without loss of generality, that X is a finite union of compact rectangles. 
Pick Kx such that 



sup {|^|}<i:xandP* 



sup {|^|} > Kx i.o. 

xeX 



0. 



Let t; > and 6 = We can then divide X in M sub rectangles {"t^i, . . . , "t^Ml all having 



diameter less than 6. Define the events 

A = 
B = 



77 

n inf {0„(x) - (p{x)} < - a.a. 



sup {|^|} < Kx di.di. 

xeX 

We will show that AnBc. [sup^^x^^n^-^) ~ (Pix)} < 77 a.a.] . Suppose A n B is true. Then, 
there is iV e N such that for any n > N we can find e "tf^ such that - 
fc) < ^- Moreover, we can make N large enough such that for any n > N, Kx is 
an upper bound for all the subgradients of 0„ on X. Then, for any ^ £ we obtain from 
the Lipschitz property, 



Therefore, 



< - + Kx6 + Kx6 < T]. 



sup{0„(x) -0(x)} < r/ y l<k< My n> N 
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which implies 



sup{0„(x) - (p{x)} <r] y n>N. 



Considering Lemmas 3.8- (ii) and3.4; AnB c [sup^^x^^wW -(pix)} < t] a.a.] andP* {AnB) = 
1 we obtain {ii). The first assertion follows from similar arguments and (///) is a direct 
consequence of (/) and iii). □ 

4.10 Proof of Lemma 3. 10 

Throughout this proof we will denote by B the unit ball (w.r.t. the euclidian norm) in 
W^. From Theorem 25.5, page 246 on Rockafellar (1970) we know that / is continuously 
differentiable on 'io. Let 

h*^ inf {|^-r/|}>0. 
Pick £• > 0. We will first show that there is g N such that 

{^,ri)<{yfix),r]) + e, V ^ g a/„(x), V x e X, V r/ g B, V n > n^- (42) 

Suppose that such an Ue does not exist. Then, there is an increasing sequence {mn)'^^i 
such that for any nGNwe can find Xto„ g X, g a/;„^(Xm„), ?7m„ e B satisfying {£,m„,r]mn) > 
(yfixmj.rimj + £■ But X and B are both compact, so there are x* g X, r/* g B and a sub- 
sequence ikn)^^i of (m„)^^^ such that Xt,^ x* and 77 /t^ r/* . Then, for any {)<h<h^ 
we have 

h 

and therefore 



<^A:„,?7A:„>><V/(X^„),r7A;„>+e V^gN, 



lim hm^^ > <V/(xJ,77*> + e. 

n^oohlO h 

But this is impossible in view of Theorem 24.5, page 233 on Rockafellar (1970) . It follows 
that we can choose some eN with the property described in (42). By noting that 
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-B = B, we can conclude from (42) that 

K^,T7>-<V/(x),T/>|<e V^ea/„(x), VxGX, Vr? eB, Vn>ne. 
By taking t]^ = when ^ ^ V/(x) we get 

sup {|^-V/(x)|}<e yn>ne. 

xeX 

Since e" > was arbitrarily chosen, this completes the proof. □ 
A Appendix 

A. 1 Results from convex analysis 

LemmaA.l Letz e U", xi,...,x„ £ IR^ anddeflne the function g : — Uby 

g(x) = inf J e^z'' :f^e'' = i,f^ e'^xk = x,e>o,eeu"\. 

[k=l k=l k=l J 

Then, g defines a convex function whose effective domain isConv{xi,..., x„) . Moreover, 
ifJfx,z is the collection of all proper convex functions y/ such that y/ixj) < for all j = 
l,...,n, then g = sup^g^^,^ {y/}. 

Proof: To see that g defines a convex function, for any x £ IR'^ write 

A, = \deu":f^e'' = i, f^e''xk = x,e>o\ 

[ fc=l fc=l J 

and observe that for any x, y £ K^, t e (0, 1), d e Ay and £ we have tO + {1- t)d £ 
^fx+(i-f)y and hence 



g[tx+{\-t)y)-{\-t)Zl^^d^z ^ « 
t 



k=\ 



Taking infimum over Ax and rearranging terms, we get 

g[tx+{\-t)y]-tg{x) ^ « ^fc^fc 

1-^ h 
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and taking now the infimum over Ay gives the desired convexity. The convention that 
inf(0) = +00 shows that the effective domain is precisely the convex hull of xi, . . . , x„. 
Finally, for any i/a g J^x,z and x e Convixi,. . . , x„) we have, for 6 e U" with 6 >0, x - 

L%,ej xj andz^.ej = I, 

n n 

y/{x) < ^ e^y/ixj) < d^zj 

since iff{Xj) < for any j - l,...,n. The definition of g as an infimum then implies that 
y/ix) < g(x) y y/e^x,z,xeConvixi,...,Xn). The result then follows from the fact that 

g G '^x,z- □ 



LemmaA.2 Letz e U", Xi,...,x„ e IR anddeflne the function h:U^M.by 

/z(x) = infJ Y.e''z'':f^e'' = l, d+ f^d''Xk = x, 6 >0, 6 eU",d eRfi 

[fc=l fc=l k=l J 

Then, h defines a convex, componentwise nonincreasing function whose effective do- 
main is Convixi,...,Xn) + Moreover, if^x,z is the collection of all componentwise 
nonincreasing, proper convex functions ij/ such thaty/ixj) < z^ for all j = l,...,n, then 

Proof: The proof that h is convex is similar to the proof that g is convex in Lemma A. 1. 
Now, if X < y e K^, observe that for any eeR",d£Ui with I^^^ 0*^ = 1, -9 + Z^^i O^^k = 
X, > 0, we also have d+{y-x)+ 1^^^ Q^Xt = y and d+{y-x)e Uf. Then, from the 
definition of h we see that h{x) > h{y). Thus, h is componentwise nonincreasing. That 
the effective domain of is Corn; (xi, ... , x„) + Uf is clear from the fact that for any x 
not belonging to that set, the infimum defining hix) would be taken over the empty set. 
Finally, for any if/ e ^x,z and x e Conv{xi, . . . , x„) + IR+ we have, for 9 eU" and d eUf 
with e>0,x = d + Ohj and IJ^^ 6^ = 1, 

y/ix) <y/\f^ e^xj < f; ejxifixj) < f; e^zJ 
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since xifixj) < for any j - l,...,n. The definition of h as an infimum then implies that 
i/a(x) <h{x)\lfe ^x,z, X G Conv (xi, . . . , x„) + [R+. The result then follows from the fact 
that h e ^x,z- □ 



A.2 Results from matrix algebra 

Before proving Lemma 4.1, we need the following result. 

LemmaA.3 Let j e {l,...,d}, a e {-1,1}'^ and p* > 0. Then, the optimal value of the 
optimization problem 

min {a^ Bj , W2 - wi) 



s.t. 



\Wi\ 



Wi, W2 G 



d 



/5^7=p* and it is attained at - -^^u^e; and wX - -^a- -^a-'e,-. 
Proof: Writing w = {wi, W2) with Wi, W2 £ for any w g [R^^, consider /, gi, gz 



i)2d 



defined as: 



f{w) = {a^ej,W2- wi), 



giiw) = 
gziw) = 



2 

1 (( p 



leVdl 

2 



j -lu/il^J, 



sVd 



3p* 

W2 —a 

8Vd 



Then, /, gi, ga are twice continuously differentiable on IR^'* and the optimization prob- 
lem can be re-written as minimizing /(if) over the set {u/ g [R^*^ : gi(if) > 0,g2(if) > 0}. 
The proof now follows by noting that the vector w * = * ; ) g IR^'* and the Lagrange 
multipliers A* = and A2 - are the only ones which satisfy the Karush-Kuhn- 
Tucker second order necessary and sufficient conditions for a strict local solution to this 
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problem as stated in Theorem 12.5, page 343 and Theorem 12.6, page 345 in Nocedal 
and Wright (1999). □ 



A.2.1 Proof of Lemma 4.1 



Without loss of generality, we may assume that r = 1. Let R,- be and pick 5 g |o, j, 
= ^-5 and p* = Consider a matrix Z = {zi,...,Zd) £ K'^'"' with columns 

zi,...,Zrf e I 



l'^ and define the function | : IR^''^ 



ei 



■^2 ■^l 



■^2 ■Zj^ 



as 



where the bars denote the determinant and the equation is written symbolically to ex- 
press that |(Z) is a linear combination of the vectors {ej}i<j<rf with the cofactor cor- 
responding to the ij, l)-th position as the coefficient of ej. This is a common notation 
for "generalized vector products"; see, for instance, Courant and John (1999), Section 
2.4.b, page 187 for more details. Since the determinant and all cofactors can be seen 
as a continuous function on IR'^'''', it follows that | is continuous on IR'^'"^. Now choose 
a e {-1, 1}^ and observe that 



^(a^ei,...,a''erf) = 
|(a^ei,...,a'^ed) 



n 

17 = 1 / 



{^{a^ei,...,a''ed),ajej) = f] a*' V j = l,...,(i. 



Since IR'^'"^ has the product topology of the d-fold topological product of [R^ with itself, 
the continuity of | and of (•, •> imply that we can find pa ^ \o,^-6^ such that if xj £ 
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B{a^ej,pa) for any j = /3 = {xi,...,Xd} andX^g = (xi,...,Xd), then 

||(X^)|-V^ < 5, 
~ < 5, 



-a 



Taking this into account, define 



< 5 V j = l,...,d. 



(43) 
(44) 



U.p =11'^"' 1?,^ and fo„,/3 = <^a,/5,xi>. 

From the definition of the function | it is straight forward to see that (^a,/?, Xj - Xi> = 
Vj G {1, . . . , d}, so we in fact have 

Xi, . . . , Xrf £ Jea,p := {X e : <^a,^, X> = 

Moreover, (43) and (44) imply 

_L,5>t,„,>-L-5>o, 

For simplicity, and without loss of generality (the other cases follow from symmetry), 
we now assume that a = e, the vector of ones. By solving the corresponding quadratic 
programming problems, it is not difficult to see that 

1 



2d 



P* = —-S< ba,p 

V d 



inf {|x|} 
sup {|x|}. 



l-dVd mini<y<rf{|^^ „|} «„,^,x><V/i 



For the first inequality see, for instance, Exercise 16.2, page 484 of Nocedal and Wright 
(1999). For the second one, one must notice that 2\/d > ^ + 6 > ba,p and that the 
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optimal value of the optimization problem must be attained at one of the vertices of 
the polytope {x eUf : {^a,p>x) < ba,p}- The latter statement can be derived from the 
Karush-Kuhn-Tucker conditions of the problem. 

The inequalities in the last display imply that B(0,p*) c ^ and {x g IR"^ : |x| > 

Finally, for xGB(-a-'e;,ipQ:) we have |x+Xj| < pa and therefore <^q;,/3,x> < -{^a,p>Xj) + 
Pa < 6 - ^ + Pa < 0. We can then take any p < ^mmae^_j^ ^d{pa} to make ii)-{vi) be 
true. We'll now argue that by making p smaller, if required, ivii) also holds. 

Let Bi=B (o, Y^)' ^2 = (ItI^- ^) and consider the functions (p,\f/ ■.U'^'''^ ^ R 
given by 

(piX) = inf \ min \{X{W2-Wi)y\ 
y/iX) = sup max I (X 1^1)4 k 

Both of these functions are Lipschitz continuous with the metric induced by the || • II2- 
norm on R'^^^ with Lipschitz constants smaller than p* . To see this, observe that 

\X{W2 - wi) - Y{W2 -wi)\< \\X-Y\\2\W2 -wi\< \\X-Y\\2 

lb 

forall Wi G Bi, W2 g B2 andX, Y e R^''^. Also, simple algebra shows that |mini<y<rf{x^} -mini<j<^i{y^} 
|x - y I V X, y G IR"^. From these assertions, one immediately gets the Lipschitz continuity 
of (p. Similar arguments show the same for y/. 

Let j^a G IR'^'"^ be the diagonal matrix whose j'th diagonal element is precisely 
From Lemma A.3 it is seen that (piJ^a) = the other hand, it is immediately 

obvious that xf/^J^a) - j^^- Using one more time the continuity of y/ and (p and that the 
topology in K'^'''* is the same as the topology of the d-fold topological product of IR*^, for 
each a G {-1, l}*^ we can find ra for which Xp = (xi, . . . , x^) g IR'^''^ and \xj - a^ej] < ra 

fo^ J = 1 ^ i^^piy 1^^^^"'^ - ifei < ife mx-^^) - ^1 < It follows 
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that 



inf I 


min 


f>i 1 


[i<j<d 




inf J 


min 


f>i 1 


[i<j<d 


wieBi,W2&B2 



The proof is then finished by taking p < mm^^^_^^d {r^ a ^} . □ 



A.2.2 Proof of Lemma 4.2 

Assume again, without loss of generality, that r = 1. Lemma 4.1 and ivi) imply that 
x^jj,x_i^ij e{xeW^ : {x,£,a) < ba) for any j - 1, n andany a g {-1, l}"^. It follows that, 
in addition to being convex, (^ae{-\,i}d^^^ ^ '■ ^^a, x) < ba) contains {x+i, . . . , x+d) and 
hence it must contain K. For the other contention, take x g n^^j.^ ^jdlw g IR*^ : {^a, if) ^ 
ba) with X 7^ and any a g {-1, l}*^ for which x g S^a- Then, {^a, x> > for otherwise we 
would have 

?CXG^„\^+ V?c>0 

which is impossible by (i;) in Lemma 4.L Thus, = {a g {-1, l}*^ : {^a> x) >0} ^ and 
we can define 



• f \ . . f ba \ 

Tx = mm < — > and = argmin < — > . 



xe^^ I (^a> X) 

Note that > L Since jSa^ is a basis, there is g K'^ such that r^x = O^x^i-^ + ... + d'^x^d^. 
But then, 

ba. = <rxx,e«,> = eHx^,,,uj = ba. 

where the last equality follows from of Lemma4.1 andtherefore0^+...+0^ = L Now 
assume that 6^ < for some j e{\,...,d} and set jx g {-1, l}*^ with = for j and 
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ri = -ai. But then, Lkjijd'' = 1-6^ > I, {x^kj^,^jj = bj^ for k^j and {xj ^^yj < by 
iii) and {vi) in Lemma 4.1. Therefore, 

{r,x,^y^) = eHx_jM.)+j:0hx^k,,^jJ (45) 
> Y.d\x^k^,^,;)>b,^ (46) 

kjij 

which is impossible because it contradicts the definition of r^. Hence, > and we 
have rxX g Conv[Pa^)- Note that since belongs in the interior of ^ae{-i,\]''^^ ^ '■ 
{^a, uj) < ba), there there is jc > such that -kx g f^ae{-i,i}'i^^ ^ '■ ^^a, if) ^ ba}- 
Applying the same arguments as before to -kx instead of x, we can find fx > and 
dtx £ {-1, l}"^ such that -fxX g Conf (jSa J- It follows that -f^x, r^x g K and therefore 
0,xeK since > 1. Hence, we have proved (/). 

To prove (//), note that A := ^a£{-i,i}d^^ ^ '■ (^a> uj) < ba) is open and, by (/), it is 
contained in K. Thus, A(^K°. That K° A follows from the fact that if x g is: \ A, then 
<^a,x> = ba for some a g {-1, l}'^, which implies that B(x,t) n Ext(iir) ^ for all t > 
and hence xtK° . 

It is then obvious that {i v) follows from the identity dK - K\K° and the fact that K 
is closed. 

Pick any a g {-1, l}"^ and observe that (//) and ivi) from Lemma 4.1 imply that for 
any 7 G {-1, 1}*^ we have 

- by if 7^ = 
^0<by if7'^ = -a^ 
which by [i v) of this lemma show that 

x^jj e{weR'^: (U, w) = ba) n (n^g{_i ^jdju; g : {^j, w) < fo^jj 

for all a g {-1,1}'^ and j - l,...,d. Since the sets on the right-hand side of the last 
display are all convex we can conclude that 
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{^j,X^ki^) 1 



for all a e {-1,1}'^. Thus, \J^^^_i ^dConv\Xayi,---,Xg^j c dK. Finally, take x g dK. 
Then, there is g {-1, 1}^ such that {^a^,x) - ba^- Since is a basis we can again 
find 6 eW^ such that x = 0^x^i ^ + . . . + O'^x d^. Just as before, (^a^;. x j .> = ba^ implies 

X X ^xJ 

that Y.dj = I. And again, if 6^ < for some j, we can take ^ {- 1. 1}*^ with = for 
j and = -a^ ^nd arrive at a contradiction with similar arguments to those used 
in (45) and (46). This shows that x e Conu[l5a^] and completes the proof as {v) and ivi) 
are direct consequences of (/) - (/ v) and Lemma 4.1. □ 

A.2.3 Proof of Lemma 4.3 

Let r e (0, ^) if d > 3 and r > if d < 2. Since the geometric properties of any rectangle 
depend only on the direction and magnitude of the diagonal, we may assume without 
loss of generality that b> and that a = j^b. This is because we can define = (1 + 
r){b- a) > and d = a - r{b- a) to obtain [a,b] = d + [-^b,b]. For any a g {-1,1}'^, 
define aj = a- la^ej g IR"^ and Wa = Za + r{Za - Z-a)- Additionally, define the functions 
i/Aa,^a:IR'^'''^x[R'^^[Rby 

y/ai&,0) = <e,0(z„-0)> 
(Pai@,e) = min {(e(Za-0))^}. 

l<j<d I > 

Considering IR^'''^ with the topology generated be the || • II2 norm and [R'^'"^ x IR*^ with 
the product topology, it is easily seen that both functions defined in the last display are 
continuous. Now, let Wa g K'^''^ be the matrix whose j'th column is precisely Waj - Wa- 
lt is not difficult to see that y/ai^a^, = < 1 and (pa{W~^, Wa) - ^fz? ^ ^- 
instance, one can check that for a = -e, one has Wa = and Waj = ^j^bJej and the 
result is now evident. By symmetry, the same is true for any a g {-1, 1}"^. Therefore, for 
any a g {- 1, l}"^ there is pa such that whenever {Xaj - Waj \< Pa^ j - ^,---,d and Xa is 
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the matrix whose j'th column is Xaj - Xa, we get 



y/aiX^ ,Xa) < 1, 



(47) 



(PaiX^ ,Xa) > 0. 



(48) 



Letting p = min 



ae{-l,ll'' 



{pa} completes the proof as (47) and (48) imply Za^Conv [xa, x, 



□ 
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